Dimensionality Reduction for Text using Domain Knowledge

نویسندگان

  • Yi Mao
  • Krishnakumar Balasubramanian
  • Guy Lebanon
چکیده

Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider the use of geometries specified manually by an expert, geometries derived automatically from corpus statistics, and geometries computed from linguistic resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Geometries for Unsupervised Dimensionality Reduction

Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider t...

متن کامل

Transfer Learning via Dimensionality Reduction

Transfer learning addresses the problem of how to utilize plenty of labeled data in a source domain to solve related but different problems in a target domain, even when the training and testing problems have different distributions or features. In this paper, we consider transfer learning via dimensionality reduction. To solve this problem, we learn a low-dimensional latent feature space where...

متن کامل

Text Document Clustering Using Dimension Reduction Technique

Text document clustering is used to group a set of documents based on the information it contains and to provide retrieval results when a user browses the internet. Experimental evidences have shown that Information Retrieval applications can benefit from document clustering and it has been used as a tool to improve the performance of retrieval of information. Information retrieval is an interd...

متن کامل

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

Rigorous dimensionality reduction through linguistically motivated feature selection for text categorization

This paper introduces a new linguistically motivated feature selection technique for text categorization based on morphological analysis. It will be shown that compound parts that are constituents of many (different) noun compounds throughout a text are good and general indicators of this text’s content; they are more general in meaning than the compounds they are part of, but nevertheless have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010